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Forming A Random Field via Stochastic 
Cliques: From Random Graphs to Fully 
Connected Random Fields 

M. J. Shafiee, A. Wong and P. Fieguth 

Abstract —Random fields have remained a topic of great interest over past 
decades for the purpose of structured inference, especially for problems 
such as image segmentation. The local nodal interactions commonly used 
in such models often suffer the short-boundary bias problem, which are 
tackled primarily through the incorporation of long-range nodal interactions. 
However, the issue of computational tractability becomes a significant issue 
when incorporating such long-range nodal interactions, particularly when a 
large number of long-range nodal interactions (e.g., fully-connected random 
fields) are modeled. 

In this work, we introduce a generalized random field framework based 
around the concept of stochastic cliques, which addresses the issue of com¬ 
putational tractability when using fully-connected random fields by stochas¬ 
tically forming a sparse representation of the random field. The proposed 
framework allows for efficient structured inference using fully-connected 
random fields without any restrictions on the potential functions that can 
be utilized. Several realizations of the proposed framework using graph 
cuts are presented and evaluated, and experimental results demonstrate 
that the proposed framework can provide competitive performance for the 
purpose of image segmentation when compared to existing fully-connected 
and principled deep random field frameworks. 

Index Terms —Fully Connected Random Field, Random Graph, Stochastic 
Cliques, Graph Cuts, Markov Random Fields 
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1 Introduction 

Probabilistic graphical modeling using random fields such as 
Markov random fields (MRFs) and conditional random fields 
(CRFs) have become very prominent and widely used for struc¬ 
tured inference. A particular structured inference challenge 
often tackled using random fields given the promising results 
is that of image segmentation Q, Q, where the use of 
random fields facilitates for the incorporation of spatial infor¬ 
mation to improve modeling accuracy Conventional random 
field models used to incorporate such spatial information have 
typically made use of short-range, local nodal interactions. 

The pairwise potential in such models is formulated with a 
label compatibility function which penalizes the assignment of 
different labels within small locally-connected nodal neighbor¬ 
hoods, leading to the short-boundary bias problem Q that ex¬ 
hibits itself in the form of excessively smoothed segmentation 
results when applied to the problem of image segmentation. 

Strong evidence @-0-0 has shown that increasing the 
number of long-range interactions in the model can attenuate 
the short-boundary bias problem, with the extreme case being 
fully-connected nodal interactions which computationally 
is intractable. 

Motivated by this, the short-boundary bias problem associ¬ 
ated with conventional random field models have been tackled 
in two different directions: i) the use of fully-connected random 


• The authors are with the Department of Systems Design Engineering, Uni¬ 
versity of Waterloo, Waterloo, Ontario, Canada. 

E-mail: {mjshafiee, alSwong, pfieguth}® uwaterloo.ca 

Manuscript received ..., 2015; revised .... 


fields via a new data representation (i.e, dense conditional ran¬ 
dom fields (DCRFs)) and specific potential function restrictions 
to achieve computational tractability, and ii) introduction of 
new higher-order pairwise penalty functions to account for 
elongated boundaries. 

The first direction for tackling the short-boundary bias 
problem (i.e., the excessive smoothness over boundaries), first 
proposed by Krahenbiihl and Koltun (0, involves the use 
of fully-connected CRFs within an efficient structured infer¬ 
ence framework to account for all possible nodal interactions. 
This new structured inference framework (DCRF) addressed 
the computational tractability problem associated with fully- 
connected random fields by restricting to specific potential 
functions (i.e., mainly Gaussian) and incorporating a new 
data representation (i.e, Permutohedral lattices) [ [T0| |. Further 
extensions (Tl) , | p^ |, | [T3| | to this framework were proposed to 
relax certain assumptions and limitations associated with 
but required feature space transformations such that a pairwise 
potential under a Gaussian kernel is obtained in order to take 
advantage of Permutohedral lattices for efficient inference. As 
such, this approach limits a major advantage of CRFs, which is 
the ability to use arbitrary potential functions when modeling. 

The second direction, as proposed by Jegelka and 
Bilmes [ pA[ | and Kohli et al. 1[^| (which is known as the prin¬ 
cipled deep random field model), involves the introduction of 
new higher-order pairwise penalty functions that change the 
cost of the edges that constitute a cut in the segmentation. 
As such, these models penalized the number of types of label 
discontinuities instead of penalizing the number of label dis¬ 
continuities (which is used in conventional CRFs). A potential 
limitation of this second approach is that it does not leverage 
long-range nodal interactions to the same extent as the first 
approach where all possible nodal interactions are taken into 
account, and as such may be more limiting compared to the 
first approach when dealing with complex scenes where com¬ 
plex boundary structures with similar characteristics manifests 
themselves at large distances away from each other. 

While both directions hold significant promise, here we in¬ 
vestigate a different direction to addressing the short-boundary 
bias problem through the use of fully-connected CRFs (thus 
taking advantage of all possible nodal interactions) in a compu¬ 
tationally tractable manner without being restricted to specific 
potential functions when modeling. This approach proposes 
an efficient structured inference using fully-connected CRFs 
that attempts to combine random graph theory with 
random field theory. More specifically, we are motivated by 
fundamental work (Tt) , [Ts) in graph sampling and random 
graph theory where it was shown that it is possible to ex¬ 
tract sufficient information from dense graphs by examining 
stochastic sparsified versions of such graphs. As such, here we 
introduce a novel approach to probabilistic graphical modeling 
where the underlying dense graph of a fully-connected CRF 
is stochastically sparsified, thus addressing the computational 
complexity associated with structured inference using fully- 
connected CRFs without needing any additional restrictions or 
assumptions that can limit modeling power. 

The work presented here extends significantly beyond our 
preliminary works |[T9|, (20) in the following manner. Although 
the previous works |19|, pO) introduced and analyzed the 
concept of the stochastic clique in specific situations, here 
a generalized probabilistic graphical modeling framework is 
introduced that unifies all previous and preliminary works 
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based on the concept of stochastic cliques, where a fully- 
connected CRF is stochastically sparsified through the stochas¬ 
tic formation of a subset of cliques within the fully connected 
random field to be harnessed in the inference procedure. It 
will be illustrated that such stochastically sparsified represen¬ 
tations will yield approximately the same behaviors as that 
of the fully-connected CRF from which they came from, and 
as such should provide approximately the same results when 
applied to the problem of image segmentation while yielding 
significantly reduced computational costs. Furthermore, 

• A number of different realizations of the proposed 
modeling framework is introduced based on different 
f-divergences, within which our preliminary works are 
limited, special cases. 

• A novel abstraction strategy is introduced to improve 
computational efficiency when computing f-divergences 
in the stochastic sparsification process to further improve 
computational efficiency within the proposed realizations. 

This paper is organized as follows. The proposed probabilis¬ 
tic graphical modeling framework based on the concept of 
stochastic cliques is presented and discussed in Section 
Experimental results in the context of image segmentation are 
presented and discussed in Section]^ Finally, conclusions are 
drawn and future work is discussed in Section (H 

2 Methodology 

In this section, the theory behind the proposed probabilis¬ 
tic graphical modeling framework based on the concept of 
stochastic cliques will be explained as follows. First, CRFs 
and random graph theory is explained in relation to stochas¬ 
tic cliques. Second, the fundamental theory behind stochas¬ 
tic cliques will be presented. The conditions satisfied by the 
stochastically sparsified representation of the fully-connected 
CRF produced by the proposed framework such that its be¬ 
havior is approximately the same as the fully-connected CRF 
from which it came from is discussed. Third, realizations of 
the proposed framework based on different f-divergences are 
introduced. Fourth, the abstraction strategy used to improve 
computational efficiency when computing f-divergences in the 
stochastic sparsification process is presented. 

2.1 Conditional Random Fields 

In the context of CRFs, the problem of image segmentation 
is typically formulated as a Maximum A Posteriori (MAP) 
problem, where the probability of random field Y given obser¬ 
vations X is factorized by potential functions considering the 
Hammersley-Clifford theorem and Gibbs distribution ^2\ : 

P{Y\X)='[[^i{y,^,X), (1) 

i 

where yc^ is a subset of random variables in the random field 
Y defined by the clique structure Ci and X is the observations. 
The potential function ^ 2 (-) is an arbitrary non-negative func¬ 
tion defining the relationship among random variables 
yj G yci based on observations X. The exponential representa¬ 
tion can satisfy the non-negative constraint and take advantage 
of arbitrary potential function simultaneously; hence can be 
formulated as 

P{Y\X) exp X)), (2) 


where Z{X) is the partition function or normalization constant 
and 7/^(-) is the potential function (also referred to as the energy 
function in some random fields literatures p^ , pij). 

The potential function 7/^(-) is factorized based upon clique 
structures as a combination of single cliques (i.e., unary poten¬ 
tial function) and higher-order cliques: 

n 

^{Y,X) = (3) 

i=l 

where V^u(') is the unary potential function and '0p(') is the 
spatial potential function with C being the set of higher- 
order clique structures. The higher-order cliques can contain 
several random variables based on the neighborhood size. 
However, the pairwise clique (i.e., the corresponding term is 
called pairwise potential function) is a commonly-used clique 
structure in literature p^ , p6) , p7) . The unary potential en¬ 
codes the likelihood model of each random variable yi and 
its corresponding measurement, while the pairwise potential 
represents the relationship between random variables within 
a clique structure (p G C and incorporates the spatial infor¬ 
mation into the model. The pairwise potential V^p(-) penalizes 
the assignment of different labels to random variables in a 
clique based on some associated properties (e.g., in the case of 
image segmentation, based on appearance cues such as color 
similarity). The main problem of this approach in conventional 
(local) random field models is the excessive smoothing of object 
boundaries due to the use of only local, short-range nodal 
interactions in the model (e.g., 4- or 8-connected local neighbor¬ 
hoods). The pairwise potential penalizes the energy function if 
two neighbor nodes are assigned different labels which causes 
the smoothing problem known as short-boundary bias g A 
promising approach for addressing this issue is the use of long- 
range nodal interactions. However, long-range nodal interac¬ 
tions increase computational complexity exponentially, and as 
such should be utilized intelligently to manage computational 
complexity. 

Here, we explore tackling the problem of computational 
complexity by constructing a sparse graph representation 
stochastically from the fully-connected random field by ran¬ 
domly sampling the most informative nodal interactions. In¬ 
spired by random graph theory p8) , active cliques are formed 
stochastically in the inference step to represent the fully- 
connected CRF with a sparse graph model that provides ap¬ 
proximately the same results as the fully-connected CRF. By 
combining random graph theory with random field theory 
in such a way, the resulting sparse graph retains all of the 
properties of a CRF, and as such can be used in all of the same 
structured inference scenarios that CRFs are used for. It will be 
shown that the constructed sparse graph model should have 
the same behavioral as the fully connected CRF and generates 
approximately the same results. 

2.2 Random Graphs 

Here, the underlying sparse graph representation is con¬ 
structed stochastically from the fully-connected CRF based 
on distribution probabilities, and as such generates a random 
graph structure. In general, a random graph can be defined 
as the probability distribution over graphs p6] |, and there are 
several approaches to generate a random graph. Gilbert | [T6| 
represented a random graph as Q{n^p) -Qn,p/ such that each 
edge connectivity is determined independently based on the 
selection probability p. The Erdos-Renyi model represents 
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Fig. 1. Example realizations of random graphs for some interesting cases 
based on the selection probability;?. 


a random graph as G{n,m) where m determines the number 
of connected edges of the graph, and the selection probability 
p is computed to provide the exact m edges for the graph. 
The Erdos-Renyi model is an effective model for extracting 
the essential behavior of various graph properties, which are 
explained in this section. 

The generated random graph achieves specific struc¬ 
tures based on the selection probability p. Some interesting 
cases based on p include: 
p = o(^): Gri,p is the disjoint union of trees. 

P ^ n ' contains cycles with different sizes for 0 < c < 1. 
All connected components are either trees or unicyclic 
components and almost all nodes (n — o{n)) are in com¬ 
ponents that are trees. 

p < ^: Gn,p is dramatically different, compared to when 
p > -. The largest component has size O(logn) when 
p < £, while most of the small components merge to a 
giant component with the size 0{n) and the remaining 
components are of size O(logn) when p > It is called 
double jump when P ^ ^ ^ • 

p = c ^ : All nodes in Gn,p are almost all connected with 
c > 1. 

p ~ All nodes in Gn^p are almost all connected 

and the degrees of almost all nodes are asymptotically 
equal when where uj{n) oc. 

Figure [^presents example realizations of random graph behav¬ 
ior illustrating the structural behavior of the aforementioned 
cases based on different values of selection probability p. The 
effect of p on the behavior of the random graph structure such 
that when the graph is connected (i.e., p = and when 

the number of connectivities are adequate to model the fully 
connected graph sparsely are the interesting properties incor¬ 
porated to define the proposed stochastic clique structure and 
represent the fully-connected CRF by a sparse random graph 
model for using within the proposed probabilistic graphical 
modeling framework. 

The random graph model was generalized 
by Kovalenko |[^|, in which the graph can be encoded 
by G{pi^Pij) where {i^j} are two different nodes in the graph. 
By this new model the connectivity of each possible nodal 
pair is determined based on an individual probability pij . The 
stochastic clique structure presented here is inspired by this 
generalized random graph model such that a clique is formed 
based on a distribution created based on the corresponding 
observation on its endpoint nodes. 

2.3 Stochastic Cliques 

The stochastic clique structure presented within the proposed 
generalized probabilistic graphical framework provides a new 
approach to representing the underlying graph of a fully- 
connected CRF with a sparse random graph model while 
preserving the properties of the original fully-connected CRF. 
First explored as in a preliminary, special-case form in jl^ . 


the generalized, unified theory behind stochastic cliques can 
be described as follows. Given a fully-connected CRF where 
each node i is neighbor with all other nodes in the graph: 

■V = |j|j = 1 : n,ji'^ (4) 

where \Afi\ = n — 1, n is the number of random variables 
of the random field, the set of active clique structures C is 
stochastically defined as 

where i is a node in the underlying graph G of the random 
field, V{Mi) is the powerset of Mi (i.e., the neighbors of node 
i and 0^^ is an element of V{Mi)), and represents a 

stochastic indicator function determining whether the subset of 
nodes can form a clique. Here, node i is guaranteed an element 
of the clique G C while the other nodes of 

the clique . are stochastically selected based on the jth 
element of V{Mi). 

The stochastic clique indicator function 1^. ^ is a sparsi- 

fier function which transforms the underlying fully-connected 
graph of the random field to a sparsified graph such that 
the informative nodal interactions are preserved for the infer¬ 
ence procedure. In other words, ^ | samples informative 
cliques from the set of all cliques in a fully-connected CRF 
to determine the active cliques for the inference step. The 
proposed indicator function extracts a distribution probability 
from the observations to decide whether the clique should be 
constructed and can be formulated as 

[f(X, c,,,,,)> 7-C/(0,1)], (6) 

where [•] is Iverson bracket 7 is a sparsity factor, and 
f/(0,1) is a uniform distribution over the unit interval. F(-) 
is a connectivity measure among the random variables in the 
clique 

2.3.1 Condition Satisfaction 

In this work, the inference framework is implemented in a 
graph cuts framework (i.e., s-t minimum cut) (32) . Due to the 
randomness involved in representing the underlying graph of 
the fully-connected CRF with a sparse graph representation via 
the concept of stochastic cliques, it is important to show that 
the sparse graph representation is at least connected (Connect¬ 
edness) to satisfy the Gibbs distribution (^. It is also important 
to show that the nodes in the sparse graph representation 
of the fully-connected CRF obtained via the aforementioned 
stochastic clique formation process can be partitioned into 
approximately the same sets of nodes as the original fully- 
connected graph of the fully-connected CRF by the use of 
s-t minimum cut approach with a limited variation range on 
the min cuts values (Minimum Cut), since the goal of the 
proposed framework is to address the computational complex¬ 
ity associated with structured inference using fully connected 
CRFs without impeding performance. 

• Connectedness. It was asserted by Kovalenko that 
the connectedness of the graph G(n^Pij) is satisfied if 
all probabilities pij are at least as large as It is 

worth noting that the value of pij is very small if the 
random field is constructed for tackling problems where 
the number of random variables is large, such as the 
problem of image segmentation. As an example, for an 
image that is n = 400 x 300, Pij only needs to be greater 
than = 9.7460 x 10“^ to satisfy the connectedness 
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condition which corresponds to having 12 neighbours 
per pixel. As such, the connectedness condition is easily 
satisfied for the purpose of image segmentation. 

• Minimum Cut. Karger (Tt) , Benczur and Karger (18) 
proposed random sampling techniques for approximating 
problems that involve cuts and flows in graphs. They 
proved that given dense graph H and an error parameter 
e < 1, there is a sparse graph Q which has edges and 

the value of each cut in Q is within (1 ± e) times the value of 
corresponding cut in H. 

As such, this theorem asserts that the upper bound of the 
sampling probability should be p ^ to obtain a 

sparse graph with a bounded minimum cut error of e. This 
theorem introduces a trade-off between the computational 
complexity of the graph and the minimum cut error, e. 
Therefore, it is possible to sparsify a fully connected graph, 
by specifying a fixed error rate for the cut accuracy. Using 
the previous example of an image that is n = 400 x 300, 
to represent a fully connected random field as a sparse 
representation via stochastic sparsification with an error 
parameter of e = 0.1, the number of edges in the un¬ 
derlying sparse graph should be less than or equal to 
~ 1.4034 X 10^ (or alternatively a random graph 
generated with a selection probability of p < 0.0097) to 
satisfy the minimum condition. The implications of said 
theorem leads us to the interesting idea that a random field 
with an underlying sparse graph randomly sampled from 
a fully-connected CRT can result in the same s-t minimum 
cut partitioning as the original fully-connected CRT. 

The two aforementioned conditions determine the lower 
(connectedness condition upper (minimum cut 

condition ^^ 2 ^^ ) bounds of the probability p considering a 
limited error for the result; within which the resulting sparse 
graph representation obtained via stochastic clique formation 
is a good approximation of the fully-connected CRT with a lim¬ 
ited error bound. It is noted that there is an adjustment between 
the accuracy and computational complexity of the sparse graph 
which should be optimized based on the application. 

2.3.2 Graph Representation 

Let us now mathematically define the sparse graphical repre¬ 
sentation of the fully-connected CRT obtained via stochastic 
clique formation. Graph H(V, J-) is the realization of the orig¬ 
inal underlying graph of the fully-connected CRT, where V is 
the set of nodes in the graph which represent the states pi G Y, 
T is the set of edges of the graph with \J^\ = n 

is the number of nodes. Each node Vi £ V in the graph H(-) 
represents a random variable Pi associated with an observation 
Xi G X. Corresponding to graph l-L{y^X), there is a graph 
0(V, £) with the same set of nodes V and the set of edges £, 
\£\ < 1*^1 constructed via stochastic clique formation. ^(•) is 
the realization of a random graph [ [T6| based on the underlying 
behavior of the stochastic clique indicator |^. 

As demonstrated in Figure]^ each node in the graph is con¬ 
nected to all other nodes while the active cliques participating 
in the inference procedure are determined based on probability 
distributions. The probability of two nodes forming a clique is 
different for each pair of nodes. For example, two nodes with 
higher values of F(-) (recall that Ff) is a connectivity measure 
between two nodes) have a higher probability to construct an 
active clique in the inference step than two nodes with lower 
values of F{-). However, there is still a possibility for two 



Fig. 2. An illustration of the sparse graphical representation of the fully- 
connected CRF obtained via stochastic clique formation. The clique con¬ 
nectivities for node i are stochastically formed based on the connectivity 
measures (i.e., F{-)) between node i and all other nodes in the graph. For 
example according to two nodes with high values of F(-) (e.g., node i 
and j, F{xi,xj) >) have a higher probability of connectedness than that 
for two nodes i and k, which have a lower value of F(-) (F{xi, xf <). For 
a better visualization, only the potential connectivities for the center node i 
are only shown. The blue dashed lines show the fully-connected nature of 
the random field while the the solid black lines indicate the pairwise active 
cliques in the inference step. 

nodes i and k with lower F(-) to form a clique, as illustrated 
in Figure]^ 

2.4 Realizations 

While the proposed framework is a general approach that can 
be applied to a large number of structured inference problems, 
here we examine a realization of the framework for the purpose 
of image segmentation. 

For tackling the image segmentation problem, assume that 
Ci^fyiij is a combination of pairwise cliques in the random field; 
therefore, each consists of only one random variable j ^ i 
of the random field. Let each node i be characterized based 
on the observations of the spatially surrounding neighbors of 
node i, as encoded by a distribution function. Based on these 
assumptions, can be reformulated as 

%n„}= [^(^-^i)>7-C^(0,l)] (7) 

where Si and Sj are the encoded neighborhood statistics for 
two nodes i and j in the pairwise clique respectively. 

Since the utilized observation is the statistical information, F(-) 
in lU can be a f-divergence measure D{’) between two distri¬ 
butions Si and Sj. This approach is useful for the problem of 
image segmentation as it enables the stochastic clique indicator 
function to sample informative cliques as active cliques based 
on their encoded neighborhood statistics, which in the case of 
images can characterize textural information, in the inference 
step. 

Changing the connectivity measure F(-) in the stochastic 
clique indicator IL|. can change the behavior of the stochas¬ 

tic clique indicator. Here, we present three different realizations 
of the proposed generalized probabilistic framework based on 
different f-divergence measures. 

2.4.1 Bregman Divergence 

For the first realization, a Bregman divergence (33) is utilized 
to formulate Df) in d7l such that 

D^{Si, Sj) = ^Si) - f{Sj) - {Si - Sj.vHSj)) (8) 

where 0(') is a continuously-differentiable real-valued and 
strictly convex function. 

A limited, special case of this realization of the proposed 
framework was first explored in |[^, where ^{v) = \\v\\‘^ and 
Si is encoded by a Dirac delta distribution: 

(9) 

with returning the measurement corresponding to node i. 
This derivation guides the computation to a Euclidean distance 










5 


between two nodes (pixels) in the random field (T^ . This 
similarity measure is the popular one utilized in random field 
approaches 0,10. 

2.4.2 Kullback-Leibler Divergence 

For the second realization, a Kullback-Leibler divergence is 
utilized to formulate D(-) in 0 such that 

DKL{Si,Sj) = I (10) 

The nodes surrounded by similar structures should have 
higher probability to be connected in the underlying graph. 
Therefore, each node can be affected by other nodes with 
similar structure and pixel intensity properties. The similarity 
can be encoded by statistics extracted from neighbor nodes. 

In several situations the underlying neighborhood statistics 
may not be well characterized using a parametric distribution 
model. Therefore, in this realization, we assume that the neigh¬ 
borhood statistics follow a non-parametric distribution (e.g., 
histogram) which characterize the surrounding appearance of 
the pixel (node). We introduce the second realization based on 
a non-parametric variant of the Kullback-Leibler divergence, 
where Si and Sj are represented using discrete histograms: 

K 

DKhiSi, Sj) = ^ ( 11 ) 

where K is the number of histogram bins and Sij and Sjj are 
/th discrete bins of histograms Si and Sj. 

2.4.3 Hellinger Distance 

The Kullback-Leibler divergence is a f-divergence, when 

f{v) = V ln('z;): 

D{P \\Q)=J f{^)dQ. ( 12 ) 

To show the impact of different functions f{v) on the results, 
as the last realization, F(-) is modeled within a f-divergence 
framework such that f{v) = {y/v — 1)^. The new function /(•) 
turns the f-divergence to a Hellinger distance which, where 
Si and Sj are represented using discrete histograms, can be 
formulated as: ^ 

Sj) = ^ ( 13 ) 

where K is the number of histogram bins and Sij and Sjj are 
/th discrete bins of histograms Si and Sj. 

2.5 Connectivity Computation via Abstraction 

To construct the sparse graph representation of the fully- 
connected CRF based on the stochastic clique structure within 
the proposed framework, the one-to-one connectivity measure 
F(-) must be computed for all nodes in the fully-connected 
CRF. The computational complexity of this procedure increases 
exponentially based on the number of random variables (e.g., 
number of pixels in the case of image modeling). However 
some of these similarity evaluations are redundant since there 
can be many similar nodes in the random field which they have 
the same one-to-one similarity value with other nodes in the 
random field. To significantly reduce the computational com¬ 
plexity of computing connectivity measures, we are inspired 
by the work of Nagamochi and Ibaraki p4) , (35) , where it was 
shown that if an edge in the graph is not in the minimum cut. 




Fig. 3. Nagamochi and Ibaraki theorem (^; If an edge in the graph is 
not in the minimum cut, then its corresponoing nodes must be on the same 
side of the minimum cut result. It is assumed that the red dashed line is 
the minimum cut of the graph. In our example, the edge e is not crossed by 
the cut; therefore, two blue nodes corresponding to edge e are in the same 
side of the cut. As such, the connectivity measures between a node I and 
connected nodes that are similar to each other on the opposite side of the 
cut can be approximated as the same such that the resulting graph has the 
same minimum cut value as the original graph. The proposed abstraction 
strategy approximates the connectivity measure F between node I and 
node i as seen in left graph by the expected value of F between node I 
and the set of nodes Xc = {i,j} (denoted by £;|^F(a:z,Xc)j) in the right 
graph. In this example after applying the abstraction strategy, Fi(-) and 
F 3 (-) in the left graph are replaced by A(-) in the right graph. 

then its corresponding nodes must be on the same side of the 
minimum cut result. Figure [^demonstrates the aforementioned 
theorem visually. As such, the connectivity measures between 
a node / and connected nodes that are similar to each other 
on the opposite side of the cut can be approximated as the 
same such that the resulting graph has the same minimum 
cut value as the original graph. Motivated by this, we propose 
an abstraction strategy where we approximate the one-to-one 
connectivity measures at significantly reduced computational 
complexity when compared to directly computing all connec¬ 
tivity measures. 

Instead of computing the one-to-one connectivity measure 
F(-) between a node and all other nodes, the abstraction 
strategy computes the expected value of F(-) of the node and 
a group of nodes that are similar to each other: 

F{xi,Xi)\xiex, - E^F{xi,Xc)^ ( 14 ) 

F{xi,Xc) = ^^F(xl,Xi)\xi e Xcj ( 15 ) 

where Xc is the set of nodes in the graph, Xi G Ac is a 
particular node in the group of similar nodes and E['] 

encodes the expectation function. The value of E |^F(x/, Ac)j is 
approximately equal to the actual value of E{xi^Xi) since the 
Xc is the combination of nodes that are similar to each other. 
Furthermore, even if this approximation does deviate from the 
actual value of E{xi^Xi), the nodes that are similar to each 
other are on the same side of the cut with high probability since 
they are grouped together as Xc and have zero value of F(-) 
between each other while have larger values (greater than zero 
or zero for exactly similar ones) of F(-) with outside nodes of 
Xc. As such computing the expected value instead of the actual 
value does not change the relationship amongst the nodes in¬ 
side the set Xc and the outside nodes; therefore, the individual 
final cut edges are not changed based on the aforementioned 
theorem. It is worth noting that the intra-edges in the group 
of similar nodes have very large connectivity measures such 
that their corresponding edges have very low probability to be 
a cut edge. Therefore, the proposed abstraction strategy has a 
very low probability of changing the actual cut edges of the 
problem. 

As shown in the right graph of Figure [^ instead of com- 






puting the connectivity measure |Fi(-), F 3 (-)| between node 
I and nodes i and j respectively, the abstraction strategy ap¬ 
proximates these functions as the expected value based 

on a set of the nodes Xc which consists nodes i and j. Using 
this strategy, only one computation is done to approximate the 
connectivity measure between node I and all nodes in the set 

Xc = {ijy 


3 Results & Discussion 

The performance of the proposed probabilistic graphical mod¬ 
eling framework was compared with that of different state- 
of-the-art random field inference frameworks for the problem 
of interactive image segmentation. The three different realiza¬ 
tions of the proposed framework as discussed in Section |2.4| 
were evaluated to investigate the tradeoff between the use 
of different f-divergence measures. Natural images from the 
complex scene saliency dataset (CSSD) m, the Microsoft 
research interactive dataset (MRIS) (3^ , and the fine structures 
dataset (MSRA-FS) were used in this evaluation. The 
CSSD, MRIS and MSRA-FS datasets contain 200, 50 and 30 
images respectively. The segmentation procedure is conducted 
based on user-specified areas as seed points corresponding to 
the object of interest and the background. 

The MSRA-FS images were chosen as the validation set 
to find the optimal parameters through a grid search proce¬ 
dure. The same parameters are used for different realizations 
of the proposed framework for the purpose of comparison 
to maintain consistency. To investigate the performance of 
the proposed framework compared to existing state-of-the-art 
random field inference frameworks, we also tested the princi¬ 
pled deep random field (PD) framework | p^ |, which utilizes 
higher-order pairwise penalty functions, and the dense CRF 
(DCRF) j^, which utilizes fully-connected CRTs via Permuto- 
hedral lattices. The implementations of these two frameworks 
are provided by the corresponding authors via the source code 
their authors had provided publicly. The reported optimal 
parameters of the PD framework were consistent with the 
optimal solution of the tested datasets. However, the reported 
optimal parameters of DCRF had not produced the best result 
for the tested datasets and so the parameters were selected 
based on a grid search procedure to find the optimal solution. 

For the proposed framework, we utilize the following pair¬ 
wise potential function 'ipp{yi^ Vj^X): 

^piVi, yj,X) = ^{Xi, Xj) • \yi - yj\ (16) 


where 0{xi,Xj) is defined as follows for 4-connected cliques: 


0{xi, Xj) = 0.05 + 


0.95exp(—0.5 \xi — xjf) 


(17) 


where a is a controlling parameter, and 0{xi^ Xj) is defined as 
follows for long-range cliques: 


l + exp{-^\xi- Xj\) 

where the p is the controlling parameter. The use of such a 
potential function illustrates the ability of the proposed frame¬ 
work to utilize arbitrary potential functions without limitations 
to specific potential functions (e.g., Gaussian potentials). 

The neighborhood statistics of each node in the image was 
computed based on a neighborhood size of 5 x 5 centered by the 
interested node in all realizations of the proposed framework. 


The reported results in section |3d] were conducted based 
on the configuration of the proposed framework where the 
expected number of connectivities per node is 30 cliques. It 
is worth noting that this number of cliques per node satisfies 
the conditions discussed in section 12.3.1 1 


As described in Section 2.5 it is necessary to determine 
the set of Xc (U) for approximating connectivity measures 
using the abstraction strategy. Here, for the problem of image 
segmentation and for the sake of computational efficiency, a 
set of sets (denoted by U = {Xc\l < c < q}) is determined 
by finding the optimal q sets of nodes such that the I/ 2 -norm 
between the encoded statistics and relative positions of the 
nodes within the sets and their corresponding set means is 
minimized ^ 

Q. = argmin'Y^ ^ - M5,c||2 + Ibj - Mp.clb) • (19) 

C=1 jEXc 


where Sj and pj are the encoded statistics and relative position 
corresponding to node j, respectively, and ps,c and Pp^c denote 
the means of the encoded statistics and relative positions of 
the nodes within the set X^ respectively. Based on empirical 
testing, q = 500 sets was found to provide strong segmentation 
performance. 

All methods are examined and compared quantitatively us¬ 
ing three different performance metrics: i) Region Fl-Score, ii) 
Boundary FI-score, and hi) Intersection over union (lOU) (3^ . 
The FI-score is formulated as 

F = ^ -— (20) 


2 • TP + FAT + FP 


where TP, FN and FP are the number of true positives, 
false negatives, and false positives, respectively. Note that 
the boundary Fl-score is evaluated based on a 2-pixel 
tolerance. lOU is the intersection of the estimated segmentation 
result per class and the ground truth, divided by the union: 


IOU = 


TP 

tptfptfn' 


( 21 ) 


All realizations of the proposed framework were imple¬ 
mented in a graph cuts framework. The connectivity measure 
is computed by use of the proposed abstraction approach 
(section [23| for all realizations of the proposed framework with 
the exception of the Bregman Divergence realization, which is 
realized based on that presented in our preliminary work (T^ 
as a baseline reference. From this point on, BD, HD, and KLD 
will denote the Bregman Divergence, Hellinger Distance, and 
KL-Divergence realizations of the proposed framework. 

3.1 Experimental Results 

Tables and show quantitative comparisons of the tested 
methods in terms of the region Fl-score and the boundary 
Fl-score. As seen, the different realizations of the proposed 
framework achieve competitive performance when compared 
to the tested state-of-the-art PD and DCRF frameworks, and 
even outperforms them in certain datasets. As illustrated in 
Table the different realizations of the proposed framework 
was able to preserve boundaries as well as the regions of 
interest with good accuracy when compared to the other frame¬ 
works. From Table it can be seen that the reported results 
of the intersection over union (lOU) show a similar trend as 
the region Fl-score results of Table The average score rows 
in Tables and illustrate that the proposed framework 
provides strong overall performance when compared to other 
compared state-of-the-art approaches based on the different 
quantitative performance metrics. 
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TABLE 1 

Region F1-score results. The performance of the comparison methods are 
demonstrated by three different datasets including CSSD and 
MRIS and MSRA-FS (^. The time complexity is reported by 
averaging tFie running time (in seconds) of the methods. “BD”, “HD” and 
“KLD” demonstrate Bregman Divergence, Hellinger Distance, and 
KL-Divergence realizations of the proposed framework, respectively. 
“M-i-M” stands for MATLAB with MEX implementation. 



DCRF0 

PD 

BD 

HD 

KLD 

CSSD 


0.8551 

0.8286 

0.8268 

0.8625 

0.8624 

MRIS 


0.8717 

0.9032 

0.8756 

0.8862 

0.8861 

MSRA-ES 


0.8764 

0.8592 

0.8618 

0.8702 

0.8707 

Average 

0.8677 

0.8636 

0.8547 

0.8729 

0.8730 

Implement. 

C++ 

M+M 

M+M 

M+M 

M+M 

Time (s) 


0.48 

17.512 

5.275 

2.431 

2.494 




TABLE 2 



Boundary F1-score results. The performance of the tested frameworks are 
demonstrated in the case where 2-pixel tolerance distance is considered 




true positive. 





DCRF0 

PD 

BD 

HD 

KLD 

CSSD 


0.5212 

0.5349 

0.5235 

0.5659 

0.5655 

MRIS 


0.5452 

0.6389 

0.6175 

0.6133 

0.6121 

MSRA-ES 


0.5991 

0.5413 

0.5371 

0.5731 

0.5746 

Average 


0.5551 

0.5715 

0.5593 

0.5841 

0.5840 


Table reports the computational run-time of the com¬ 
pared frameworks. By comparing the computational complex¬ 
ity of the BD realization and two other realizations, it can 
be concluded that utilizing the abstraction strategy helps to 
capture more informative cliques while also decreasing the 
computational complexity of the graph cuts procedure. It can 
be observed that all realizations of the proposed framework 
achieved lower running times when compared to the PD 
framework which is implemented using a combination of 
MATLAB with MEX as with the proposed framework. It can 
be concluded that the proposed framework is efficient and 
reasonably fast enough according to its implementation. 

Example segmentation results produced by the tested 
frameworks for the different datasets are shown in Eigure]^ 
It can be seen that the PD framework has difficulties in 
preserving boundaries in the test cases shown, with either 
the background being merged with the object or parts of the 
object being classified as background (particularly in the "Tree" 
image (sixth row) and the "Reclining Girl" image (second row). 
DCRP was able to preserve boundaries better than PD for both 
the "Standing girl" (fifth row) and "Reclining girl" images; 
however, the results produced by DCRP exhibited additional 
segmentation artifacts seen in "Monk" image (third row) and 
the "Man with hat" image (fourth row). It can be observed that 
the proposed framework is capable of preserving narrow and 
elongated boundaries, as evident by the preservation of the 
tree stem in the "Tree" image by the KLD and HD realizations 
and the dog's eye and nose in the "Dog" image (first row) 
by all realizations of the proposed framework. Purthermore, 
it can be observed that the proposed framework is capable of 
dealing with scenarios characterized by complex and cluttered 
backgrounds, as evident by "Tree" and "Man with hat" images. 

4 Conclusion 

In this work, a generalized probabilistic modeling framework 
based on the concept of stochastic cliques was proposed to 
facilitate for the use of fully-connected CRTs for structured 
inference in a computationally tractable manner without ad¬ 
ditional restrictions or limitations on potential functions be¬ 
ing imposed. It is illustrated that the proposed framework 


TABLE 3 

Intersection Over Union (lOU) results. To ensure that the reported 
performances of F1- Scores are consistent, all frameworks are compared 
based on lOU quantitative measure. 



DCRF0 

PD 

BD 

HD 

KLD 

CSSD 

0.7626 

0.7306 

0.7328 

0.7740 

0.7739 

MRIS 

0.7912 

0.8320 

0.8057 

0.8091 

0.8092 

MSRA-ES 

0.7953 

0.7287 

0.7737 

0.7846 

0.7852 

Average 

0.7830 

0.7637 

0.7707 

0.7892 

0.7894 


provides competitive performance for the purpose of image 
segmentation when compared to existing fully-connected ran¬ 
dom field frameworks and the principled deep random field 
framework, which are considered to be state-of-the-art in the 
random field frameworks for image segmentation. Although 
the reported results are based on the use of the standard 
graph cuts inference approach, the proposed framework can 
be utilized within different inference approaches, which is a 
worthy direction for future investigations. 
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